Epitome for Automatic Image Colorization 
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Abstract 

Image colorization adds color to grayscale images. It 
not only increases the visual appeal of grayscale images, 
but also enriches the information contained in scientific im- 
ages that lack color information. Most existing methods of 
colorization require laborious user interaction for scribbles 
or image segmentation. To eliminate the need for human la- 
bor, we develop an automatic image colorization method 
using epitome. Built upon a generative graphical model, 
epitome is a condensed image appearance and shape model 
which also proves to be an effective summary of color infor- 
mation for the colorization task. We train the epitome from 
the reference images and perform inference in the epitome 
to colorize grayscale images, rendering better colorization 
results than 071/ in our experiments. 



1. Introduction 



Colorization adds color to grayscale images by assign- 
ing color values to images which only contain a grayscale 
channel. It not only increases the visual appeal, but also 
enhances the information conveyed by scientific images. 
For example, the grayscale images acquired by scanning 
electron microscopy (SEM) can be made more illustrative 
by adding different colors to different parts of the images. 
However, the manual colorization is tedious and time con- 
suming, so it is not suitable for batch process. To overcome 
this problem, we propose an automatic colorization method 
by epitome. Figure [H shows the colorization result for the 
Nano Mushroom-like image. We train the epitome from 
one manually colorized Nano Mushroom-like image, and 
use that epitome to automatically colorize the other Nano 
Mushroom-like image, which eliminates the need for hu- 
man labor and makes batch colorization process possible. 

Based on the source of the color information used to col- 
orize the grayscale images, existing colorization techniques 
fall into two main categories: user scribble based meth- 



ods and color transfer methods. The user scribble based 
method in [ 8 ] asked users to draw color scribbles in the 
grayscale image, and the algorithm propagated the user- 
provided color to the whole image requiring that similar 
neighboring pixels should receive similar color. Later, L. 
Qing et al. | 9 ] proposed a method which required less hu- 
man intervention. The user scribbles were employed for 
texture segmentation and user-provided color was propa- 
gated within each segment. Using a similar color image 
as a reference, the color transfer methods such as ifTHl per- 
formed colorization by transferring the color from the ref- 
erence image to the grayscale image, either automatically 
or with user intervention. However, the pixel-level match- 
ing based on luminance value and neighborhood statistics 
adopted by 1 1 1 ] suffered from spatial inconsistency and the 
user-provided swatches were required to guide the matching 
process in many cases. (5) improved the spatial consistency 
by an image space voting scheme. Their method first trans- 
ferred color to a few pixels in the target image with high 
confidence, then applied the method in [ 8 ] to colorize the 
whole image, treating the colorized pixels in the first step 
as the scribbles. However, their method required a robust 
segmentation of the reference image, which was difficult in 
many cases without user intervention. 

Similar to ifTTl . our automatic colorization method trans- 
fers the color information from the reference image to the 
target grayscale image. Since most of existing colorization 
methods need user interactions for color selection or seg- 
mentation, a robust and automatic colorization algorithm is 
preferable. In order to approach this problem, it is worth- 
while to exploit the biological characteristics of human vi- 
sual system. The average human retina contains much more 
rods than cones |3] (92 million rods versus 4.6 million 
cones). Rods are more sensitive to cones but they are not 
sensitive to color, so that most of visually significant varia- 
tion arises only from luminance differences. This fact sug- 
gests that we do not need to search the whole reference im- 
age for the color patches to colorize the target image, in- 
stead we can reduce the search space for color patches, or 



equivalently find an effective color summary of the refer- 
ence image, to improve the efficiency and alleviate color 
assignment ambiguity. In ifTTIl . such summary is a set of 
source color pixels randomly sampled, which is, however, 
subject to noise in the raw pixels. 

In order to find an effective and compact summary of 
the color information in the reference image, we adopt the 
condensed image appearance and shape representation, i.e. 
epitome (6j. Epitome consolidates self-similar patches in 
the spatial domain, and the size of the epitome is much 
smaller than that of the image it models. By virtual of the 
generative graphical model, epitome can be interpreted as 
a tradeoff between template and histogram for image rep- 
resentation and it has been applied to many computer vi- 
sion tasks such as object detection, location recognition 
and synthesis ||T0l O. Epitome summarizes a large num- 
ber of raw patches in the reference image by only repre- 
senting the most constitutive elements. In our epitomic col- 
orization scheme the color patches used to colorize the tar- 
get grayscale image are retrieved from the epitome trained 
with the reference image, rather than from the raw image 
patches. Epitome proves to an effective summary of the 
color information in the reference image, which produces 
more satisfactory colorization results than fTO in the exper- 
iments. 

The paper is arranged as follows. Section 2 describes 
the process of automatic colorization by epitome as well 
as the detailed formulation of training the epitome and in- 
ference in the epitome graphical model, especially on how 
epitome summarizes the raw image patches of the reference 
image into a condensed representation and how inference 
is performed in epitome to automatically colorize the target 
grayscale image. Section 3 shows the colorization results, 
and we conclude the paper in Section 4. 

2. Formulation 

2.1. Description of Automatic Colorization by Epit- 

ome 

Given a reference color image cl and the target grayscale 
image gl, we aim to automatically colorize gl with the 
color information from cl. We achieve this goal by first 
training an epitome e from the reference image, then per- 
forming inference in e so as to transfer the color informa- 
tion of the color patches of e to the corresponding grayscale 
patches of gl. Note that the grayscale channel of gl is re- 
tained as the luminance channel after the color transfer pro- 
cess. We will illustrate the training and inference process in 
detail in the following subsections. 

2.2. Training the Epitome 

Epitome is a latent representation of an image, which 
comprises hidden variables and parameters required to gen- 



erate the image patches according to the epitome graphi- 
cal model. Epitome summarizes a large set of raw im- 
age patches into a condensed representation of a size much 
smaller than the original image, and it approaches this goal 
in a manner similar to Gaussian Mixture Model with over- 
lapping means and variances. 

The epitome e of an image / of size M x N is a con- 
densed representation of size M e x N e where M e < M 
and N e < N. The epitome contains two parameters: 
e = (/x, 4>). /j, and 4> represent the Gaussian mean and 
variance respectively and both of them are of size M e x N e . 
Suppose Q patches are sampled from the reference image, 
i.e. {Zk}® =1 , and each patch Z& contains pixels with image 
coordinates Similar to (6), the patches are square and 
we use fixed patch size throughout this paper. These patches 
are densely sampled and they can be overlapping with each 
other to cover the entire image. We associate each patch 
with a hidden mapping Tk which maps the image coordi- 
nates Sfc to the epitome coordinates, and all the Q patches 
are generated independently from the epitome parameters 
and the corresponding hidden mappings as below: 

p(Z k \Tk,e)= Y[ AT(zi i k]fj,T k (i),<l>T k (i))i k = l "Q (!) 
ies k 

and 
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where 2^ is the pixel with image coordinates i from the 
k-th patch. Since z^k is independent of the patch number k, 
we simply denote it as Z{ in the following text. A/"(-; /i, <j)) 
represents a Gaussian distribution with mean (i and variance 
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Based on (Q}, the hidden mapping Tk can be interpreted 
as a hidden variable that indicates the location of the epit- 
ome patch from which the observed image patch is gen- 
erated, and it behaves similar to the hidden variable in the 
traditional Gaussian mixture models that specifies the Gaus- 
sian component from which a specific data point is gener- 
ated. Also, T k maps the image patch to its corresponding 
epitome patch, and the number of possible mappings that 
each Tk can take, denoted as L, is determined by all the 
discrete locations in the epitome (L = M e x N e in our set- 
ting). Figure [T] illustrates the role that the hidden mapping 
variables play in the generative model, and Figure [2 shows 
the epitome graphical model, which again demonstrate its 

similarity to Gaussian mixture models. 7r = {7Ti}f =1 indi- 
cates the prior distribution of the hidden mapping. Suppose 
Tk,i is the l-th mapping that Tk can take, then 
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Figure 1. The mapping Tk maps the image patch to its corre- 
sponding epitome patch with the same size, and can be mapped 
to any possible epitome patch according to Tk- 
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Figure 2. The epitome graphical model 

L 

1=1 

which holds for any k G {1..Q}. 5 is an indicator function 
and 5 equals to 1 when its argument is true, and otherwise. 

Our goal is to find the epitome e that maximizes the log 
likelihood function: 

e = argmaxlogp ({Z fe }^ =1 |e) (3) 

Given the epitome e, the likelihood function for the com- 
plete data, i.e. the image patches {Z k }® =1 and the hidden 
mappings {Z k }® =1 ,is derived below according to the epit- 
ome graphical model: 

Q 

p({Zfe,7fc}^ =1 |e,7r) = JJp(Z fe ,7fe|e,7r) 



k=l 



Q 



l[p(T k )p(Z k \T k ,e) 



k=i 

Q L 

nn 

k=l 1=1 



S(Tk=Tk,i) 

(4) 



We use the Expectation-Maximization algorithm J4) to 
maximize the likelihood function © and learn the epitome 
e, following the procedure introduced in [TJ. 

The E-step: The posterior distribution of the hidden 
variables, i.e. the hidden mapping is 

q(Tk) = p(7fe|Zfc,e,7r) 
_ p(Z k \Tk,e)p(Tk) 
^2 Tk p(Z k \T k ,e)p(T k ) 
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(5) 



We observe that q(T k ) corresponds to the responsibility 
in Gaussian mixture models. 

The M-step: We obtain the expectation of the log- 
likelihood function for the complete data with respect to 
the posterior distribution of the hidden mapping from the 
E-step as below: 



^[logp({Z fc ,r fc }^ =1 |e,7r 



Q L 



y^y^qiTk = Tk,i) ■ [log tti + \ogp(Z k \T k = 7fe,z,e)] 

(6) 
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Maximizing © with respect to (e, 7r), we get the fol- 
lowing update of the parameters of the epitome and 7r: 
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(7) 
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(9) 



The index j indicates the epitome coordinates in (|7]) and 
®. We alternate between E-step and M-step until conver- 
gence or the maximum number of iterations (20 in our ex- 
periments) is achieved, and then obtain the resultant epit- 
ome e from the reference image cl. 



Note that the above training process is applicable for a 
single type of feature of cl. We use two types of feature to 
train the epitome, i.e. the YIQ hannels and the dense sift 
feature [7 ]. We convert cl from the RGB color space to the 
YIQ color space where Y channel represents the luminance 
and IQ channels represent chrominance information. More- 
over, dense sift feature is computed for each sampled patch. 
A K x K patch is evenly divided into Rx R grids, and the 
orientation histogram of the gradients with 8 bins is calcu- 
late for each grid, which results in a SR 2 -dimensional dense 
sift feature vector for each patch. R is typically set as 3 or 
4. We then train the epitome e = (e y/Q , e ds ^*) f Qr me 
YIQ channels and the dense sift feature, and the epitome 
for YIQ channels (e YI ®) share the same hidden mapping 
with the epitome for the dense sift feature (e ds ^ £ ) in the 
inference process [ 10 1 : 



p(Z k \T k ,e)=p(Zl IQ \Tk,e 
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where Z YI ® and Z s ^ 1 represent the YIQ channel and the 
dense sift feature of patch Z& respectively, e YI ® and e dsz «^ 
represent the epitome trained from the YIQ channels and 
dense sift feature of cl respectively. < A < 1 is a param- 
eter balancing the preference between color and dense sift 
feature. 

2.3. Colorization by Epitome 

With the epitome e learnt from the reference image, we 
colorize the target grayscale image gl by inference in the 
epitome graphical model. Similar to the epitome training 

process, we densely sample Q patches {Zk}® =1 from gl 
(these patches cover the entire gl). With the hidden map- 
ping associated with patch denoted as %, the most prob- 
able mapping of the patch Z& , i.e. 7^*, is formulated as 
below: 



(r fc |Z fc ,e,7r) (11) 



T£ = argmaxp [Tk\Z k 

% 

which is essentially the same as the E-step ©. We take 
the grayscale channel of gl as the luminance channel (Y 
channel) of itself. Since the color information (IQ channels) 
is absent in gl, we only use the epitomes corresponding to 
the Y channel and the dense sift feature to evaluate the right 
hand side of (fT2l) . The color information is then transferred 
from the epitome patch, whose location is specified by 7^* , 
to the grayscale patch Z&. We denote the target image after 

colorization as gl c . Since {Zk}® =1 can be overlapping with 
each other, the final color (the value of IQ channels) of a 
pixel i in image gl c is averaged according to: 
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(12) 
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where §k is the image coordinates of patch and 
^_ represents the value of the IQ channels in the epit- 

k 

ome e at location T£ (j). 

3. Experimental Results 

We show colorization results in this section. As men- 
tioned in section 2, we use square patches of size K x K, 
and the size of epitome is half of the size of the reference 
image. We densely sample patches with horizontal and ver- 
tical gap of ujK pixels, where uj is a parameter between 
[0, 1] and it controls the number of sampled patches. 

Figure [3] shows the result of colorization for the dog im- 
age. We convert the original image to grayscale as the target 
image. The patch size is 12 x 12 and the parameter A balanc- 
ing between the color and the dense sift feature is 0.5. We 
compare our method to ifTTll which transfers color from the 
reference image to the target image by pixel-level matching. 
The result produced by IfTTll lacks spatial continuity and we 
observe small artifacts throughout the whole image. On the 
contrary, our method renders a colorized image very similar 
to the ground truth. This example also demonstrate that the 
learnt epitome, which is a summary of a large number of 
sampled patches, contains sufficient color information for 
colorization. 

Figure|4]and [5]shows the colorization result for the Nano 
Mushroom-like images and the cheetah. The patch size is 
chosen as 12 x 12 and 15x15 respectively, and A is set to 
be 0.8 for both cases. IfTTll still generates artifacts around 
the top and bottom of the Mushroom-like structure, while 
our method produce a much more spatially coherent result. 
Moreover, we transfer the correct color for the cheetah to 
the target image, which results in a more natural coloriza- 
tion result than that of IfTTll . 

4. Conclusion 

We present an automatic colorization method using epit- 
ome in this paper. While most of existing colorization 
methods require tedious and time consuming user interven- 
tion for scribbles or segmentation, our epitomic coloriza- 
tion method is automatic. Epitomic colorization exploits 
the color redundancy by summarizing the color information 
in the reference image into a condensed image shape and 
appearance representation. Experimental results shows the 
effectiveness of our method. 




Figure 3. The result of colorizing the dog. From left to right: the reference image, the target image (obtained by converting the reference 
image to the grayscale), the result by fTTl . and our result. 




Figure 4. The result of colorizing the Nano Mushroom-like images 
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Figure 5. The result of colorizing the cheetah 
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